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Abstract 

Ratios of universal enumerable semimeasures corresponding to hypotheses are 
investigated as a solution for statistical composite hypotheses testing if an un- 
bounded amount of computation time can be assumed. 

Influence testing for discrete timeseries is defined using generalized structural 
equations. Several ideal tests are introduced, and it is argued that when Halting 
information is transmitted, in some cases, instantaneous cause and consequence 
can be inferred where this is not possible classically. 

The approach is contrasted with Bayesian definitions of influence, where it is 
left open whether all Bayesian causal associations of universal semimeasures are 
equal within a constant. Finally the approach is also contrasted with existing 
engineering procedures for influence and theoretical definitions of causation. 

Keywords: algorithmic information transfer - Halting information - composite 
hypothesis testing - causation - infiuence testing 

Introduction 

The paper introduces necessary tools to define and investigate general purpose in- 
fluence tests for discrete timeseries in the ideal case that an unbounded amount of 
computation time is available. 

In statistics, a simple hypothesis corresponds to a hypothesis that contains enough 
information to infer a unique probability distribution over all measurement data that 
could be expected. Comparing two simple hypothesis according to given data is well 
understood [22]. In many cases too little useful information is available to construct 
such a unique probability distribution. There is no general accepted solution for the 
problem of composite hypotheses testing. It is argued here that the ratio test for 
universal semimeasures can theoretically define a solution for composite hypothesis 
testing. 

An extensive literature exists on the definition of 'influence' both in statistics and 
in philosophy [3 [T7J [2TJ |^ [31]. However, most of this work defines influence only 
when a flxed distribution is already available. General purpose statistical tests, differ 
from simple statistical tests with respect that they can infer a useful notion of influence 
without reference to a semimeasure. To define ideal infiuence tests, or to interpret 
practical algorithms such as [lOl [151 [El [211 [26] |28l [30l [32] , there is no theory available 
that considers both the statistical interpretation and the computability aspects. 
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Causality is often related to structural equations |25j. Traditionally, computable 
functions are used to study these generalized structural equations. The set of semimea- 
sures corresponding to these structural equations do not lead to sets of semimeasures 
with a universal element. To solve this problem, structural equations with partial 
computable functions will be considered. 

Overview of results. In section [T] composite hypothesis testing using univer- 
sal semimeasures is discussed. It is shown that if a composite hypothesis is associ- 
ated with a set of semimeasures that is "testable" and is a product of convex sets 
of semimeasures, it has a universal enumerable semimeasure among its enumerable 
semimeasures. This result will be applied to the hypothesis of independence and of 
timeseries being influence-free. 

In section [2] different hypothesis of influence-free and causal timeseries are deflned. 
All corresponding sets of semimeasures have universal elements and there values can 
differ substantially if Halting information is instantaneously transmitted between two 
timeseries. This happens depending on whether instantaneous information is assumed 
to be originated from a hidden source, or is instantaneously transmitted from the flrst 
signal to the second or opposite. 

In sectionjS] causal semimeasures Bayasianally associated with enumerable semimea- 
sure are introduced, which are a superset of the enumerable causal semimeasures, 
defined in section [2] It is shown that they do not have a universal element. However, 
for the causal semimeasures associated with some universal element, it is left open 
whether such a universal element exists. The results are summarized in figure [3] 

The logarithm of the proposed ideal statistical tests, define an algorithmic variant 
of the Shannon information transfer. In section [4] both quantities are related, and 
therefore an alternate interpretation of the algorithms in HSl HH [S2 can 

be given. Also Granger causality can be interpreted in this framework. Finally, 
the proposed test for infiuence is contrasted to Shannon information transfer of the 
minimal sufficient statistic, and to graphical representations of minimal sufficient 
statistics as in [TS]. It is shown that enumerable algorithmic information transfer 
determines plausible causal relations, where these relations can not be determined 
from probabilistic minimal sufficient statistics. 

Definitions and notation. Let uj be the set of natural numbers, and e be 
the empty string. The binary strings 2"^^ of finite length can be associated with 
w. Let l{x) denote the length of x in its binary expansion and let 2" be the set 
of strings x with l{x) = n. Let be the set of w-sequences of length n. Let <& 
be a universal Turing machine. <i>t(p, x) J,= y means that on input p,x outputs 
2/, and halts in less than t computation steps. Let 2" denote Cantor space, the 
space of infinite binary sequences. 2" can be associated with the Real numbers in 
[0,1] Q For r € [0,1], are the first k decimals in the binary expansion of r. For 
x G 2'"'^, x'^ denotes xiX2---Xk- A real function / : cj ^ [0, 1] is computable if there 
is a string p such that for all k,x: ^{p,x,k) I— f{x)^. An enumeration of an one- 
argument real function f{x) is a two-argument computable rational function g{x,t) 
such that for all t: g{u,t) ^ g{u,t + 1) and such that limtg{u,t) = f{u). With 
abuse of notation an enumeration of / is denoted as ft- A function / is enumerable, 
respectively co-enumerable, iff /, respectively — / has an enumeration. A semimeasure 
P is a non-negative real function that satisfies ^{Pix) : x £ uj} ^ 1. A semimeasure 
P (multiplicatively) dominates a semimeasure Q, notation: P ^* Q, if a constant 
c exists such that for all x: cP{x) > Q{x) [20^. P =* Q, if P s^* Q and Q s^* P. 
A set S of semimeasures has a universal element to if to G S" and to dominates all 
semimeasures in S. A function f : lo ^ lo (additivcly) dominates a function g : lu uj, 

^ In this association for example the number 0.5 is both associated with 0111... as with 1000.... 
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notation: / ^+ g, iff there is a constant c such that for all x: f{x) + c ^ g{x). / g 
iff / ^+ g and 5 /. For some of the proofs prefix- free Kolmogorov complexity is 
needed, which is defined in the appendix. 



1 Ideal Bayesian composite hypothesis testing 

This section describes scientific hypothesis testing and defines a notion of significance 



for the ratio test of universal semimeasures. Proposition 1.2 shows the existence of 
universal semimeasures for a variety of hypotheses. 



1.1 What do hypothesis tests? 

Scientific modeling is the process of making rules for symbols that represent observ- 
ables or properties of observables. These rules can be iteratively applied and combined 
to reproduce past observations or predict and control future observations in specific 
contexts. In scientific modeling, one often first starts to infer rules from a restricted 
context (inference) , and conjectures that they apply to more contexts (generalization) . 

Scientists agree or disagree on the applicability of rules and models in different 
contexts. When a rule is under discussion, the rule is called a hypothesis. If scien- 
tists agree on the applicability of hypotheses and models, science is advancing. When 
probabilities are involved in a model, a hypothesis can imply a semimeasure over 
all possible expected observations in a context of an experiment. A hypothesis that 
implies such a semimeasure is called a simple hypothesis. The discussion of the appli- 
cability of the hypothesis in the restricted context of the experiment often happens 
through the use of significance or hypothesis tests applied to observations of an ex- 
periment which is called data. Such tests define significance or probability of type I 
error, and sensitivity, or probability of type II error, or one minus power. The test 
rejects or fails to reject the zero hypothesis if the significance is below or above a 
predefined value. The test favors the alternate hypothesis if also sensitivity is below 
a predefined value. 

In scientific models, probabilities arise when in a context some variables are not 
observable or some variables are beyond control. Three types of probabilities can be 
obtained, either 

• from repetitive observations of data. (Frequentistic probabilities) 

• from rules about the variables that are not subject to discussion. (Objectivistic 
probabilities) 

• from an unknown observer-dependent model. (Subjectivistic probabilities) 

Significance and sensitivity are probabilities. If the probabilities of the hypothesis are 
obtained frequentistically or objectively, then at least significance and sensitivity can 
have the same interpretation. For a frequentistic setting, the significance respectively 
sensitivity of a statistical test is the maximal limit fraction of repetitive evaluations 
of the test where is rejected, in a context where describes the observed data 
well, respectively will not reject in a context where describes the observed 
data well. In the objectivistic setting, the significance respectively sensitivity is the 
objective prior probability that the zero hypothesis disqualifies itself, respectively 
alternate hypothesis disqualifies itself. 

More formal, let an one-sided statistical test d{x) be high for data x that seem to 
contradict the zero hypothesis. Assume that zero and alternate hypothesis are simple 
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with probability distributions and P^. Significance and sensitivity for data x 
according to d{x) are given by: 

aix) = : d{y) > 

m = : d{y) ^ d{x)}. 

If a{x) is small, a scientist will conclude that either a rare event has occurred or that 
P'^ is not representative for x. In practice, he will reject the zero hypothesis. If also 
P{x) is small, he will favor the alternate hypothesis. 

A specific choice of such a statistical test d is given by the likelihood ratio P'^{x) / P^{x). 
This ratio has also an interpretation within Bayesian statistics: if a? /a^ represents 
the ratio of prior belief in the hypothesis corresponding to P'^ relative to the belief in 
the hypothesis corresponding to P^, than after observing data x the posterior ratio 
of the beliefs is: 

g" P^{x) 
a^P^{x)' 

Due to the coding theorem [20 this Bayesian interpretation is also justified by an Oc- 
cam's razor argument that favors the hypothesis that can be described with minimal 
code length. 

The Newman-Pearson lemma states that /3 o : [0, 1] [0, 1] is uniformly 
maximal for 

d{x)^P^{x)/P°{x). 

This means that there is a test that has for any significance an optimal sensitivity. 
This shows that optimal hypothesis testing is equivalent to likelihood ratio testing. 
Remark that significance and sensitivity are bounded by P^{x)/P^{x). 

A composite hypothesis is a collection of rules that imply a set of semimeasures. 
There is no accepted optimal general way of extending the hypothesis testing from 
simple hypothesis testing to composite hypothesis testing. Many methods in literature 
are proposed that are theoretical optimal under some conditions j^, or have been 
found to be empirically useful in specific contexts. Let H'^ and be the sets of 
semimeasures constituted by the zero and alternate hypothesis. 

• Uniformly optimal test: In specific cases, there is a test that has an optimal 
P o function for all combination of tests in iJ" and H^. 

• Bayesian approaches: Assign some fixed prior probability to all semimeasures, 
this reduces the problem to simple hypothesis testing. Often it is not possible to 
extend the hypothesis with an acceptable prior and therefore this is a subjective 
method. 

• Generalized maximal likelihood: This is the likelihood ratio of the best case 
hypotheses: 

max{P(a;) : P E H^} 
max{P(a;) : P E H^} ' 

This is the most commonly used method. In specific cases this method is proved 
to be optimal, but in other cases it has problems or is subject to discussion [9 . 

Suppose that a composite zero i?" and alternate hypothesis H"^ have universal semimea- 
sures and m^. If the sets of the semimeasures are convex and countable = 
PI, P2, ... for i ~ 0,A, the universal semimeasures m* satisfy: 

m^=*^a,Pj. (1) 
j 
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This means that hypothesis selection by the hkehhood ratio 



d{x) = m^{x)/m°{x), 

can be considered as a Bayesian approach to composite hypothesis selection. Since all 
universal semimeasures are equal up to a constant factor, the subjectivity is limited 
to a constant factor. Also because G and because it is multiplicatively optimal, 
it can be considered as generalized maximal likelihood testing if one can neglect the 
constant factors. 

The significance of d relative to mP , does not have a direct frequentistic or objective 
interpretation. In general a repetitive experiment with controlled and uncontrolled 
variables in the environment can not frequentistically evaluate to or m"^. Objec- 
tively, neither or are guaranteed to become the accepted semimeasures for 
the context under discussion. Assume and H"^ satisfy the conditions of Propo- 
sition 1.2 Since any choice of the positive constants Ui in ([T|) results in a universal 
semimeasure, without loss of generality the constants — l/i(logi)^ can be chosen. 
Let i ^ fc/(logfc)'^ than: 

P°ix) fcm°(x) 
m^{x) ^ m^{x) 
If the significance of d is large, this can mean that: 

• Some complex model from the zero hypothesis describes the data. 

• The alternate hypothesis m"^ better describes the data. 

• A rare event has occurred. 

In many cases the first interpretation must be partly taken into account, and therefore 
one should look for a separate notion of significance for the statistic d{x). For example, 
a frequentistic significance bound is obtained by a permutation test for the Shannon 
information transfer statistic in l24i. 



1.2 Universal semimeasures for a composite hypothesis. 

Semimeasures are used in stead of measures, since the set of computable or enumerable 
measures has no universal element. The set of co-enumerable semimeasures also has 
no universal element [H HH] • 

A positive real function P is a length conditional semimeasure, if for all n: 

Y,{P{x) : X e 2"} ^ 1. 

The use of length conditional semimeasures allows to reduce technical details. Fur- 
thermore they can be justified by remarking that in many experimental setups, the 
amount of generated data, is fixed before the experiment starts. From now on, 
semimeasure is short for length conditional semimeasure. 

Definition 1.1. Let 5 be a set of semimeasures: 

• S"^ is the subset of enumerable semimeasures in S. 

• S" is testable iff there is a computable logic expression L such that for any 
semimeasure P: P G S* iff some rational approximation Pt of P satisfies: 

Vi,n < : L{PP), 

where P" is the finite restriction of Pt on 2". 
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• S is convex iff from any P, Q G S*, and a,b [0, 1] with a + b ^ 1: aP + bQ G R. 

• The product set of two sets of semimeasures S, T is given by 

S xT= {PQ : Pes AQ eT} 

Remark that the product set of two semimeasures is also a semimeasure. 
Proposition 1.2. Let S,T be sets of semimeasures. 

(i) If S is testable and contains Pq — than is enumerable. 

(a) If S is convex and can be enumerated as Pi,P2,..., than contains the 
universal semimeasure 

^ y^ a^Pj, 

where ai > is any computable real function such that X^ieij ^ 1- 
(Hi) If ^T^ have universal elements ^rn^ , then 

is a universal element for S*^ x . 

Proof. The first two items of the proposition are a direct generalization of the proof 
of the existence of universal enumerable semimeasures [T^ UHl [2U] . 
Part (i). Define the enumeration Pi^t- let Pi.Q{x) — for all i^x. Remark that 
Pi^Q e S. For all t let 

Pi,t{x) = max{$t(i, X, s) : s t A ^t{i, x, s) I}, 

if J2{Pi,t{x) : X G 2"} ^ 1, and L{P,p^) is true, otherwise let for all x: 

P^Ax)^P^,t-lix). 

Remark that for all i,t, Pi,t{x) is computable and that if i is a code for a Q G 5*^, 
than there is an i such that Q = Pi. 
Part (ii). Let: 

mf = "^{a^P^.t -.is^t}. 

Remark that mf is computable, that it increases with t, and therefore is enumer- 
able. Remark that by convexity for all t, mf G S\ and for any n, the values mf{w) 
with w G remain constant for some t large enough. Therefore the limit is also in 

. Finally remark that dominates all Pi. 
Part (iii). Clearly m'^^^^^ € x TL Let R e x TL It remains to show that 
R s$* m'5^><^\ There exist P G 5^, Q G such that R = PQ. since P < cpm'^ and 
Q ^ cgnv^ , we have that 

R = PQ ^ cpcqm^m^ = cpcqm^"^ 

□ 

From Proposition |1 . 2| it follows that the set of univariate, bivariate and conditional 
enumerable semimeasures have a universal element denoted as: m{x), m{x,y), and 
m{x\y). The set of independent enumerable semimeasures is given by P(x,y) = 
Q{x)R{y), for Q, R univariate semimeasures. The set satisfies the conditions of item 
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(Hi) of Propositio n |1.2[ and therefore has universal element m{x)m{y) . Also remark 
that by Corollary |2.6| there are sets T, such that the universal element of x 
can be a factor o{n/ logn) lower than the universal element of (S' x r)^. 

By the coding theorem, we have that — logrn(a;) ="*" K(x) and — log TO(a;, y) ="*" 
K(x, y). For this reason I{x; y) = K{x) + K(y) — K(x, y) naturally appears as some 
notion of confidence for ideal independence |19j . 

2 Influence-free and causal semimeasures 

This section derives several influence tests from generalized structural equations both 
for pairs of discrete variables, and for pairs of discrete timeseries. It is shown that 
when Halting information is present in two observations x,y, the obtained universal 
elements from the structural equation hypotheses can imply slightly different likeli- 
hoods if X is assumed to cause y or y is assumed to cause x. When x,y represent 
discrete timeseries, the difference in likelihood can become significant depending on 
whether x is assumed to instantaneously cause y or y is assumed to instantaneously 
cause X. 

2.1 Statistical explanatory model 

First the concept of statistical explanatory model, is discussed within the computabil- 
ity framework. 

Cantor space 2" = [0, 1] with tree topology is assumed, it is, for every r S 2<": 

[r] = {a e 2<'^ : r C a}, 

with r \Z a meaning that r is a prefix of a. The measure is given by — 2^'^''^. 

Let X ^ Lo denote a discrete observable, a statistical explanatory model for X, is 
given by some unobservable, or uncontrolled variable R G [0,1] with a probabilistic 
description given by a semimeasure Pr over the unit interval [0, 1] 

and some function / such that X — f{R). For some observation x of the observable 
X, if a; = f{r), than /, r is a probabilistic explonation of the observed data x, where 
r represents the hidden or uncontrolled variables of the context where the value x of 
X is observed. The a-priori probability of occurence of x is given by: 

= J dr{r : x = /(r)}, 

where Lebesgue integration over r, with respect to the measure is performed, and 
/ is assumed to be integrable. 

For many contexts, it can be assumed that / is partial computable and is 
enumerable. According to Lemma |2.1| without loss of generality, these assumptions 
are equivalent with assuming Pj^ uniform over [0, 1]. 

Lemma 2.1. // the variable R is distributed according to an enumerable Pfj, and f 
is partial computable, then there is a partial computable /' and a uniform distributed 
variable R' on [0, 1], such that for all x: 

Pf,R{x)^Pf,,R,{x). 

Proof. First the function a is inductively defined. For any x, let a{x, 0) = and let 
a{x,t) = a(2",t-l)+^{Ft(z)-Pt_i(z) iz^x}. 
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Remark that for every r' such that 



s$ / : X e 2"}, 

there is a unique t such that r' E [q(2", i— 1), a(2", t)]. Therefore, each such r' defines 
a unique a;, such that 

r' G [a{x — l,t), a{x, t)]. 
If l{r') is long enough, than also 

[r'] e [a(x- l,t),a(x,t)]. 

is satisfied, it is 

/ e [a(a;- l,t),a(a;,i) -2'(''')]. (2) 

Let /'(r') be the function that is defined to be x if there is an x such that ^ is 
satisfied, and undefined otherwise. Remark that /' is partial computable and satisfies 
the conditions of the Lemma. HOW CAN THIS BY NICELY WRITTEN OUT, 
HOW SHOULD I CHANGE DEFINITIONS ? □ 

From now on, the variable R will be assumed to have the uniform distribution. 



and Pf is short for Pf,R- According to Proposition 2.2 the set of explanatory models 
is equivalent with the set of enumerable semimeasures. 

Proposition 2.2. For every partial enumerable f, the semimeasure Pf is enumerable. 
For every enumerable semimeasure P , there is a partial computable Junction f such 
that P = Pf. 

Proof. The first claim follows by definition. Let a(x,t) be as in the proof of Lemma 



2.1 The second claim follows by choosing f{r) = x if there is an x such that for 



some t 

a{x,t) sCr<a(a;,i) + 2-'('^), 

and /(r) — oo (undefined) otherwise. Remark that / is partial computable and 
satisfies the conditions of the Lemma. □ 

In [mill], the proof that high K{K{x)\x) is rare, shows that for ti as defined 
there, which increases faster than any computable function of i, the probability for a 
prefix- free Turing machine that a program halts after time ti is bounded by o(2~'). A 
similar proof shows that only for a small measure of hidden and uncontrolled variables 
R, there are x'es for which the exploratory model needs more computation time than 
t,.. 



2.2 Causal explanations for a pair of observables 

Different types of explanatory models are defined, and the corresponding universal 
elements are compared. 

• An explanatory model for two discrete observables X, Y is given by a partial 
computable function fxY and a variable R, uniformly distributed over [0,1], 
such that: 

iX,Y) ^ fxYiR). 
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• An explanatory model for two independent discrete observables X, Y is given 
by two partial computable functions fx,fY and two variables Rx,Ry, inde- 
pendently and uniformly distributed over [0, 1], such that: 

X = fx{Rx) 
Y = fYiRy). 

• An explanatory model for two discrete variables X, Y such that X causes Y is 
given by two partial computable functions fx, fY\x two variables Rx, Ry, 
independently and uniformly distributed over [0, 1], such that 

X - fx{Rx) (3) 
Y = fY\x{X,RY). (4) 

In a similar way as Proposition |2.2| the explanatory models defined above are equiv- 
alent with sets of semimeasures. 

Proposition 2.3. The universal elements of the sets of semimeasures corresponding 
to the explanatory models for X, Y , respectively independent X, Y , and X causing Y , 
are given hy m{x,y) , respectively, m{x),m{y) and m{x\y)m{y) . 

Proof. This follows by the corresponding result of Proposition |2.2| and |1.2[ □ 

Remark that the universal semimeasure m{x,y) can be factorized. 

Lemma 2.4. Let x* he a program of length K{x) that computes x, than 

m{x,y) =* m(y\x* )m(x) . 

Proof. Follows by applying the coding theorem and additivity of prefix-free Kol- 
mogorov complexity [20]: 

K{x) + K{y\x*) =+ K{x,y) 

□ 

The corresponding test for the hypothesis that x is independent from y if x is a 
probabilistic cause of y is given by: 

m{x)m{y\x) m{y\x) 
m{x)m{y) rn(y) 

The corresponding test for the hypothesis that x is independent from y if x,y are 
generated in the most general way, is given by: 

m{x,y) m{y\x*) 
m{x)m(y) rn{y) ' 

by Lemma |2.4[ Remark that to approximate this test, a shortest model for x might 
be needed. By Proposition |2.5| these tests can differ. 

Proposition 2.5. For every n and all x,y E 2" 

m{y\x) ^ 
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For every n, there are x,y d 2" such that 



m{y\x*) 
m{y\x) 
m{x\y)m{y) 
m(y\x)m(x) 



n 
log n 

71 

log n 



Proof. The first claim of Proposition 2.5 follows from the coding theorem [20] : 

logm{y\x*) =+ K{y\x*) 
logm{y\x) =+ K{y\x) 

Remark by 1, Lemma 4.2] (see also appendix) that K{x),x computes x* and by [20l 
page 242] it follows that 



K{x* 



K{K{x)\x) logn. 



Remark that K{y\x) ^+ K{y\x*) + K{x*\x). Combining the above equations shows 
the claim. 

Now the second claim of Proposition 2.5 is shown. Remark that K{x) can be 
computed from x* , and that 

K{x) K{K{x),x) ^+ K{K{x)) + K{x\K{x)*). 

Let y = K{x). By applying the conditional coding theorem, it only needs to be shown 
that 

K{K{x)\x) ^+ K{K{x)\x*) + \ogn-\og\ogn 

Remark that K{K{x)\x*) 0. By [20, Theorem 3.8.1], it follows that for every n, 
there is at least one x such that K{K{x)\x) \ogn — log log n. □ 

Corollary 2.6. There are hypothesis S,T such that for every n, there are a;,y € 2" 
such that 

uiiS^T) (^x,y) ^ ^ n 



tSTxTT 



{x,y) logn 



Proof. Let S be the hypothesis that x is generated by a partial computable function 
of a hidden variable r^, and let T be the hypothesis that y is generated from x by any 
function of a hidden variable ry and x. The universal element of 5^ is given by m{x), 



1.2 



the 



sal 



the universal element of is given by m{y\x). By Proposition 
element of S'^ x 

is given by m{x)m{y\x) . 

It will now be shown that the universal element of {S x T)^ is given by m{x, y). 
First remark that the semimeasure m{x,y)/m(x) corresponds to some generalized 
structural equations where y is generated from x and a hidden variable ry, by some 
function f{x,r) that is not partial computable. Since m(x,y) is also computable 
m{x, y) E {S X TY , but since m{x, y) is also universal to the most general enumerable 
set of semimeasures, it must be universal to (5 x T)^. 

Finally it needs to be shown that for every n there are x, y S 2" such that: 

■m{x,y) 
m{x)m(y\x) 

This follows from Proposition |2.5| and Lemma |2.4[ □ 
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Remark that if the partial computable functions f.^, in the definitions of the dif- 
ferent exploratory models where chosen computable, the corresponding sets would in 
general not have a universal element. Any computable semimeasures P{x\y), P(y), 
also generate semimeasures P{y\x), P{x) that satisfy P{x\y)P{y) = P{y\x)P{x). This 
means that it does not matter for the likelihood of x, y to describe first x and than y 
and vice versa. This contrasts with the likelihood obtained from enumerable universal 



semimeasure by Proposition 2.5 Therefore, when K{K{x)\x) is large, and y contains 
much information about K{x), the hypothesis that x caused y can be considered more 
plausible than vice versa. 

Before a last type of hypothesis is introduced, the definition of total arguments of 
partial computable functions is given. 

Definition 2.7. A partial computable function f{x,y) on 2^'^ x U for some set U 
has a total argument x, iff for any x and y (z U such that /(x, y) is defined, also any 
f{w, y) is defined with w G 2'^^'. 

The hypothesis of y being totally caused by x. is given by partial computable func- 
tions fx,fY\Xi such that fY\x{^7 ^x) has a total argument X, and two variables 
Rx,Ry independently and uniformly distributed over [0,1], satisfying equations ^ 
and Q . In a similar way, this hypothesis corresponds to a set of enumerable semimea- 
sures, such that for each n, every P{e\y) with y e 2" is constant. These semimeasures 
have a universal element denoted as m(y\x). Defining causality with this hypothesis 
can lead to fundamentally different results by the following Lemma. 

Lemma 2.8. For some c and all n ^ c, there are x, ?/ G 2" such that: 

, m(y\x) , 
log — r~r\ ^ n ~ clog n. 
m{y\x) 

Proof. This follows from the standard coding theorem, the total coding theorem, and 
the existence of a;, y e 2" for any n such that [21 [53] 

K{x\y) ~ K{x\y) 21ogn. 

For the definition of total conditional prefix-free complexity see [2j|23]. □ 

This difference is due to Halting information present in y [2 . It can be interpreted 
as follows: if the computation of fY\x requires a time ti (see higher) for some large 
i, than x,r must contain a large amount of Halting information. Remark that ti 
contains about i bits of Halting information 3 . For a general partial computable 
function fY\Xi this Halting information can be obtained from both arguments of the 
function r and x. If fY\x is total in its first argument, and fY\x{^j is defined, than a 
program can be made that generates ti from fY\x r. Therefore if the computation 
of y is so involved that it needs a time ti, than i bits of Halting information are present 
in r, and such probability decreases with 2^\ This is not the case for the partial 
computable fY\x- 

2.3 Causal and influence-free explanations for two timeseries 

Let X,Y G ui"- be observables representing timeseries. The hypothesis that X is an 
instantaneous cause of Y, is defined as the existence of partial computable functions 
fx,fY, and variables Rx,Ry uniformly and independently distributed over [0,1]" 
such that for all i ^ n: 

X, = fx{X'-'X^-\R^) 
= fY{X\Y'-\R\.). 
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See figure [T] right, black and red. 

The hypothesis that X, Y are strict causal, is defined as the existence of partial 
computable functions fx,fY, and variables RxtRy uniformly and independently 
distributed over [0, 1]" such that for all i ^ n: 

^ fx{X'-\Y'-\R\.). 

See figure [2| black. Remark that by symmetry, if X is a strict cause of Y , than Y is 
a strict cause of X. 

The hypothesis that X is influence- free of Y , is defined as the existence of partial 
computable functions fx,fY, and variables Rx,Ry uniformly and independently 
distributed over [0, 1]" such that for all i ^ n: 

Xi = fx{X^ 

= fx{X\Y'-\R\^). 

See figure [l] right, black and red. 

The most general structure is obtained if hidden variables are shared. Therefore 
the hypothesis that X, Y can have hidden variables is given by the partial computable 
functions fx,fY and the variable R uniformly distributed over [0, 1]" such that for 
all i ^ n: 

X, = fx{Rl 

This model is both equivalent with the models from figure [1] left, and (2] right, black 
and red. 

2.4 Causal semimeasures 

The hypothesis described in the previous subsection, correspond to sets of enumerable 
semimeasures which are investigated in this subsection. For x € 2" and i ^ n, let 

and similar for P{x^,y^) and P{x^\y). For k i^i^n and I ^ j ^ n, let 

P{x ,y'\x ,v) = 

P[x'^,y'-) 

Definition 2.9. Let x,y e 2". 

• The causal semimeasure and the instantaneous causal semimeasure, associated 
with a conditional semimeasure P{x\y) is given by: 

Pi^\y^) = l[{P{x^W-\f~'):i^n} 
^(a;|yT+) = l[{P{x,\x'~\f):t^n}. 

• A conditional semimeasure P{y\x) is causal respectively instantaneous causal, 
iff for alH ^ n respectively 

Piy\x^) = Piy\x) 
Piylx^) = Piy\x). 
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• X is influence-free of y according to a semimeasure P{x, y), iff: 

Pix\y t) = Pix), (5) 

when defined. 

Proposition 2.10. For any semimeasure P{x, y), the following statements are equiv- 
alent: 

(i) P{y\x) is instantaneous causal. 

(a) Vi ^ n\/x,y G 2"[P(?/'|a;) = P{y^\x'^)'^ where defined. 

(Hi) Vi ^ rNx^y e 2" [P(x|a;*, y) = P(a;|a;*)] where defined. 

(iv) Vi ^ nVx,y G 2" [P(xi+i|a;% y) — P{xi+i\x'')] where defined. 

(v) X is influence-free of y according to P{x,y). 

Proof, (i) — > (ii): 
Let 

P{fW ]+) = \{{P{y,W-\x') : J ^^}. 

First it is shown that 

P{y'\x)^P{y^W]+). (6) 

Suppose that for some y'^,x: P{y^\x) > P(j/*|a:* t^)j than for every j = i + 1, ...,n, 
choose yj such that P(2/j_|_i|x, y-') ^ y^). Remark that this is always 

possible. This shows that: 

P{y\x) = P{v'\x)\{{P{y,\x,y^-^):j = i + l...n} 

> P{f\x^\)l[{P{y,\x^,y^-')} 

= P(y|a;T+). 

which contradicts (i). (ii) follows by 

X 

= P{y>'r)- 

(a) (Hi): By Bayes theorem. 

{Hi) (iv): By summing over all x'^^^v with v G 2"^*^-'^. 
(iw) (u): By definition. 
(v) —>■ (i): By remarking that 

P{x\y ^)P{y\x T+) = P{x,y) = P{x)P{y\x). 

□ 

Remark that the set of causal, and instantaneous causal semimeasures is testable 
and convex. Therefore, they have universal elements m{x\y ])^m{x\y f^). 

The introduced hypotheses from the previous subsection correspond to sets of 
enumerable semimeasures which have a universal element. This follows by the same 
argument as Proposition |2.5| The corresponding universal elements are given by 
Proposition |2.11[ 



13 



Proposition 2.11. The universal element of the hypothesis that 

1. X is an instantaneous cause ofY, is given by m(x\y '\)'m(y\x t^)- 

2. X,Y are strict causal, is given by m(x\y Dm(y\x |). 

3. X is influence-free ofY, is given by 'm(x)m(y\x t). 

4-. X,Y have hidden common variables, is given by m{x,y). 

Proof. The corresponding sets of semimeasures are products of convex enumerable 
sets of enumerable semimeasures. The result follows by Proposition |1.2[ □ 



The universal elements define ideal hypotheses tests. Some of them can be sim- 
plified within a constant factor, using m{x,y) =* m(x)m{y\x*). 

• Suppose that X is an instantaneous cause of Y, 
are X, Y strict causal according to data x, yl 
Figure [2] left. 

m{x\y '\)m{y\x 1+) ra{y\x 



m{x\y ])'m{y\x ^) m{y\x 

• Suppose that X, Y are strict causal, 

is Y influence-free of X according to data x, yl 
Figure [T] right. 

m{x\y ]+)m{y\x t) _ m{y\x ]) 
m{x\y T+)w(y) m{y) 

• Suppose X,Y can have hidden variables, 

is X an instantaneous cause of Y according to data x,y 7 

m{x,y) 



(7) 



(8) 



m{x\y ^+)Tn{y\x t) 



(9) 



• Suppose X, Y can have hidden variables, 
are X, Y strict causal ? 
Figure [2] right. 

m{x,y) 



m{x\y ^)m{y\x t) 



(10) 



• Suppose X,Y can have hidden variables, 
is Y influence-free of X 7 



m{x\y*)m{y) m{x\y* 



m{x\y t"'")"^(^/) m{x\y t+) 



(11) 



The signiflcances of ideal independence tests can now be decomposed as the prod- 
uct of the significances of the tests above. For example as the product of the tests of 
equations (10 1, and ^ applied to x, or as the decomposition. Q, ([s]), and 
(|8| applied to x, or as the decomposition. 

m{x,y) m{x,y) m(y|a; T) m(a:|2/ t) ^^2) 



m{x)m{y) m{x\y '\)m{y\x ]) m{y) m{x) 

m{x,y) m{y\x '[^) m{y\x '[) m{x\y 'I) 

m{x\y '\~^)'m{y\x t) m{y\x t) ra{y) m{x) 
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Figure 1: Left: general system. Right: suppose that X,Y are strict causal, is Y influence- 
free of X ? 




Figure 2: Left: Suppose that X is an instantaneous cause of Y, are x,Y strict causal ? 
Right: Suppose X, Y can have hidden variables, are X, Y strict causal ? 
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2.5 Information transfer and instantaneous information trans- 
fer 



Equation ( 12 1 allows a nice information theoretic interpretation. Let 

m{x,y) 



I{x; y) = log 
EIT{x^y) = log 
EIT{xV,V^) = log 



m{x)m{y) 
m{x\y t) 
m{x) 
m{x,y) 



m{x\y \)m{y\x \) ' 
Equation ([T2| becomes now: 

I{x- y) = EIT{x ^y)+ EIT{y ^ x) + EIT{x t; y T). 



Suppose that x, y have no instantaneous connections, than the mutual information 
of x,y, can be considered as the sum of information flowing from the past of x to y, 
from the past of y to x, and information obtained by x, y through a hidden source. 

However, a decomposition of mutual information as the sum of information flowing 
from the past and the present of a; to j/ and from the past of y to x is not possible. 
Also in this setting, there can be a different instantaneous information flow if it is 
assumed that information is instantaneously flowing from x to y, or from y to x. Both 
claims follow from Proposition |2.12[ 

Proposition 2.12. For every n there exist x,y such that: 

EITix V, y ^) - log '^^^^f^.^ ^o{n). 
For there exist x,y £ lu'"' such that: 

/ I t\ ~ T^~rV ^ o 2^{log + log : I < n}). 

m[x\y \ ) m(y\x \) 



Proof. The first claim follows from Lemma |2.13| the second claim follows by an ana- 
logue result of Lemma 2.13 and from [1, Proposition ...]^ □ 



Lemma 2.13. There is a constant c such that for any n ^ ^, there exist £ 2" 
such that 

m{x,y) 
m{x\y ^)m{y\x T+) 

Proof. In [3], on-line decision complexity KR and on-line a priori complexity KA are 
defined and it is shown that for the task T = e xi; yi X2', y-n-i —> Xn'- 

KR(T) KA{T) + 2log KR{T). (13) 

Remark that in this result all quantities can be conditioned on there length, and 
that KA can be replaced by — logm(x\y t) for any enumerable on-line semimeasure. 
The analogue for — logm{x\y also holds. Let K{x\y t) and K{x\y f'") be length 
conditional on-line decision complexities as defined in [T] . On-line decision complexity 



^The current draft version of the paper from August 2009 contains only the proof for the result 
in Lemma |2.13| however, the proof can be reformulated such that also this result is shown. 
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and length conditional on-line decision complexity are related within 21ogn terms. 
Therefore: 

K{x\y]) sC -logm(2;|?; t) + O(logn) 
K{y\x]'^) ^ — logTO(?/|a; t^) + O(logn). 

By the coding theorem: 

K{x, y) =+ - log m(x, y) (14) 
By the main result of [T], there exist for each n, x,y ^ 2" such that: 

K{x\y t) + K{y\x t+) - K{x, y) > o{n). 

Combining the above equations shows that Lemma. □ 

3 Associated causal semimeasures 

In the previous section, enumerable causal semimeasures where derived as correspond- 
ing to structural equations with partial computable functions. In this section causal 
semimeasures are investigated that are in a Bayesian way associated to enumerable 
conditional semimeasures and enumerable bivariate semimeasures. 

Definition 3.1. A semimeasure P{x\y) is associated causal respectively associated 
instantaneous causal if it there is an enumerable conditional semimeasure Q{x\y) such 
that P{x\y) = Q{x\y t) respectively P{x\y) — Q{x\y t^). 

Remark that an enumerable causal semimeasure P{x\y) is associated causal, since it 
equals its own association. Also remark that the set of associated causal semimeasures 
is not convex. Since with any bivariate semimeasure P(x,y), a conditional semimea- 
sure is associated, one can associate a causal and instantaneous causal semimeasures 
also with P{x, y). 

Lemma 3.2. 

P{x, y) = P(e, e)P{x\y ])P{y\x ]+). (15) 

Proof. Remark that for the causal semimeasures P{x\y f) and P{y\x 1+) associated 
with P{x,y) one has 



P(x|yt) 
P{x\y]+) 



P{x^,y^) P{x'\y"-^) 



P(a;0,yO)'"F(a;"-i,y"-i) 
P{x\y') P{x",y^) 
P(a;0,yi)'"F(a;"-i,y")" 



□ 



3.1 Non existence of universal elements 

Proposition 3.3. If P{x\y t) is a causal semimeasure associated with an enumerable 
semimeasure P{x\y) > 0, than there exists an enumerable semimeasure Q{x\y) and 
x,y € 2" such that 

log^>o(n). (16) 
P[x\y t) 
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Proof of Proposition \3.!^ first part: definition of Algorithm^ 

Let N = 2n and let the set 2" x 2" be associated with 2^ by mapping x,y to 
z = xiyiX2y2---Xnyn- Let with abuse of notation P denote the restriction of P on 2^. 
For any restricted semimeasure P and u e 2%i ^ iV, let P(v...) denote the restriction 
of P on the strings vu for all u G 2^~*. For b G {0, 1}, let 6 = 1 — 6. The strings of 
2^ can be considered as branches in a tree. For z £ 2^ , 2 is a local minimal branch, 
iff it satisfies for all z ^ n: 

P{z') P{z'-'zi). 

For a local minimal branch z, the nodes z'^'^^^ z^''+^ for i ^ n—1 are called load nodes. 
Algorithm [T] generates for every restriction P on 2" of a computable semimeasure a 
computable semimeasure Q on 2" such that all leafs w have half weight, it is Q{w) = 
P{w)/2, except for leafs descending from load nodes which have Qiw) = P{w). This 
implies that the weights of the uneven local minimal nodes are proportionally more 
heavy than the weights of the even local minimal nodes, which shows the result of 
Lemma IBIH 



Lemma 3.4. // P{x\y t) a causal semimeasure associated with a computable 
sem,imeasure P(x,y) > 0, than Q — grow{P), with algorithm grow defined in Al- 
gorithm^ is computable and satisfies: 

log p/ I > o{n). 

P{x\y t) 

Proof. Algorithm [1] constructs Q from P such that: 

^, \ P(w) if ui is a load leaf, 
I 2 "(w') otherwise. 

For i < N and z the local minimal leaf, 

P(z'+i)<iF(z^), 

and for i < n, 

Q{z'^)-\p{z'^) = Q{z'^^')-\p{z'^+') 

> \P{Z'^^'Z^) > \P{Z''^'). 

This shows that: 

iP(z2*+l) + g(z2«+l) - iP(z2»+l) 

iP(z2') + g(z2'+l) - ip(z2«+l) 
iP(z2^+l) + iP(z2«+l) 
iP(z2») + |P(z2'+l) 

P(z2i) 1+ ip(z2^+l)/P(22i) 

6P(z2'+i) 
5 P(z2») 
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1^71 

^ -logP(x|?; t) -"log^- 
Remark that Algorithm [T] constructs a computable Q from a computable P. □ 



Data: P 
Result: Q 
begin 

z < local minimal branch in 2^ 



for i from to n — 1 do 



Q(z2*+iz*+2...) < — F(z*+iz*+2...) (load node) 

end 

Algorithm 1: grow 



Data: Pt 
Result: Qt 
begin 

V = min{Po(u') : w G 2^} 
Qo{w) ^ grow2-i/'^Po(t^;) 

s < — 

for t from to oo do 

if Pt(e) — S > V then stage s: new Qt is grown 
S^Pt{e) 
s< — s+1 

Qt ^ grow(2-i/''+^Pt) 
else 

Qt < — -p^Qt-i 

end 

Algorithm 2: grow_semimeasure 



Proof of Proposition 3.3 second (last) part. 

The causal semimeasure associated with a bivariate semimeasure P{x^ y), is the causal 
semimeasure associated with the conditional semimeasure P{x\y). For every enu- 
merable conditional semimeasure P{x\y) > 0, there is an enumerable semimeasure 
Q{x,y) > such that Q{x\y) — P{x\y). Therefore, to show Proposition 3.3 it suffices 
to show the proposition for causal semimeasures associated with bivariate semimea- 
sures. 

Let v = min {Pq{w) : w e 2^}. Remark that the enumeration Pt can be chosen 
such that v > 0, since P{x,y) > 0. Algorithm [2] uses Algorithm [l] to define an 
enumerable semimeasure Qt from the enumerable semimeasure Pt. It is now shown 
that Qt+i ^ Qt'- suppose that t,t + I are in the same stage s, than this is easily 
observed, if at time i + 1 a new stage s + 1 is reached, a new Qt+i is grown from Pt+i 
multiplied with a factor 2^/"^+^+^, which is doubled relative to the previous stage. 
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Therefore if w was a non-load leaf at time t, and becomes a load leaf at time t + 1, 



one still has Qt+i(w) ^ Qt{w). By Lemma 3.4 it follows for every t that initiates a 
new stage, that: 

Qt{.x\y]) 

Pt{x\y T) 

By Lemma |3.5| this equation also hold for the t subsequent to the i's initiating a new 
stage. Therefore, the equation holds for any t. □ 

Lemma 3.5. Suppose that for some v > Q, and some semimeasures P,Q G 2^, 
satisfy for all w € 2^ , 

P{w) > V 
Qiw) > P{w) 
Qie) < P(e) + J^, 

than 

1 ^ Q{x\y^) 

2 ^ P(x|yT) " ■ 

Proof. Remark that for any i ^ N and w € 2', one has Q{w) ^ P{w) + v and P{w) > 
v2^~'^ . Since any branch of depth j has 2^~^ leafs, one has that P{w^) > v2^~^ . 



P(u;2i) + p(^y2»)2-W+2 



T^,Piw-^'+^) + Piw'''+^)2-^+^'+^ 

ni ^^(^i;2i^ 



^ 2P(a;|yt)- 



□ 



3.2 Causal semimeasures associated with a universal semimea- 
sure 

Let m(x\y t) be the causal semimeasure associated with m(x, y) and let fh{x\y ]) be 
the causal semimeasure associated with m{x\y). 
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Proposition 3.6. There is a constant c > such that for all n ^ i, there are 
x,y £ 2" with 

^^^ m{x\y t) 
m{x\y t) 

and there are x,y £ 2" with 

^^^ m{x\y T) 
m(a;|y t) 

Proof. The first claim of Proposition [3^ is now proved. Let mt{x\y t) be associated 
with mt{x,y). Suppose the first claim is not true, than for any constant c > and 
any x,y & 2": 

m{x\y t) ^ 2'="m(a;|y t)- 

Fix some enumeration mt{x\y t) of m{x\y "f). It can now be assumed that for some 
enumeration mt{x,y) of m{x,y), one has: 

mt{x\y T) s; 2="mt(x|2/ T). 

Let 



^ cn, 



^ cn. 



^it{x\y) = 2-"mt(x|yT) 
■mt{x,y) 



Mt(y|a; T ) = 



Mx\y T) ' 



(17) 



Remark that fj,t {x\y ) may not be a semimeasure, but /it(2/|a:: t+) is a causal semimea- 



sure by Lemma 
and are used to 



3.2 



and using m(e, e) ^ 



The /i functions are limit-computable, 



define the following v functions: 

St = a,rgma.x^{ns{y\x t"*") : s < t} 

mt(a;,y) 



mst{x,y) ' 



(18) 
(19) 

(20) 



Also, remark that vt{y\x t^) is increasing in vt{x\y "f) is also increasing in t since 
Equation (20 I shows that if St = St_|_i than i^t(a;|y t) ^ ^t+i{x\y ])■ ^md for any t such 
that Sf < st+i, remark that st+i = t + 1 and by equation (20 1: 

vt{x\y T) ^ l^t{x\y T) t^t+i{x\y T) = t^t+i(2^ry T)- 



Remark that using equations (18 1 and (17) show that: 

TOi(a;, y) 



vt{x\y T) 



i^sAAv T) 

Mt(a;|y T) 
Mt(a;|y T)- 



"^st(a:,2/) 

mt{x,y) Hst{y\x T"^) 
Mx,y) rus^iylx t+) 



This shows that: 



Tn{x,y) 



i^(a;|?/ t)i^(y|a; T+) 
^ 2"=TO(a;|y T)'^(y|a; T+) 
T^'mlxly ^)m{y\x 



21 



This equation is valid for any c and x,y G 2", contradicting Lemma |2.13[ This shows 
the first claim of Proposition |3.6[ 

The second claim of the Proposition follows by replacing mt{x,y) by 

fh{x,y) = m{x\y)m{y). 

Remark that 'm{x\y t) is also the causal associated semimeasure of ih{x,y). The 
analogue contradiction of Equation (3.2) is derived by remarking that by Proposition 



m{x,y) ^* m{x\y)m{y)n. 



Corollary 3.7. m{x\y ]) and 7h(x\y ]) are not enumerable. 



□ 



Question 3.8. Let S he the set of causal semimeasures associated with an universal 
enumerable semimeasure. How much can two elements of S differ. Has S a universal 
element ? 

The relations betweens the sets of associated and enumerable causal semimeasures 
are represented in figure [3j 




Figure 3: Relations between sets of causal semimeasures and existence of universal elements. 



3.3 Associated information transfer and instantaneous com- 
mon information 

For any universal semimeasure m, the associated causal semimeasure will be denoted 
as m{x\y t). Associated information transfer and instantaneous common information 
are given by: 

Ainx^y) ^ -log!^^ 

rayx) 

AIT{x]-y]) = -log . , .. . 
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Remark that: 



I{x- y) = AIT{x ^y)+ AIT{y ^ x) + AIT{x T; y T)- 



Associated simultaneous information transfer has also another representation. 
AIT{x T; y T) = log "^^W^ = log 



i{x\y t) m{y\x t) 

This means that in contrast with the enumerable instantaneous common informa- 
tion, the associated instantaneous common information can be either interpreted as 
an instantaneous information flow flowing from x to y, or flowing from y to x, or 
simultaneously flowing from a hidden source to x and y. 

4 Shannon information transfer and minimal suffi- 
cient statistics 

4.1 Granger causality and Shannon information transfer 

Statistical tests used in engineering literature can often be structured as follows: first 
a model is fitted on the data and than influence is derived from: 

• Some parameters in the model, as for example by the use of directed transfer 
functions [TB] and partial directed coherences [?7] . 

• The complexity or magnitude of the noise of the data relative to the model, as 
for example with the use of Granger Causality [TSl [101 [HI [HI [S] , and Shannon 
information transfer 

By an on-line coding theorem 0, the ideal statistical tests based on enumerable 
information transfer, can be informally assumed to derive influence from the sum of 
the complexity of the model, and the complexity of the noise. It is not clear whether 
such algorithms perform better [5]. 

Let E{X^\X^) denote some average error of a prediction strategy of observations 
of the observable X given its past observations. Let E{X^\X~ ,Y~) be similar where 
the prediction strategy also uses the past of Y. In its most general form jlOlIll], Y 
is said to Granger causal X iff 

E{X+\X-) - E{X+\X-Y-)) 

is large. The most common choice for -E'(.|.) to deflne Granger causality is the mean 
squared error. 

Another choice for E{.\.) is Shannon entropy. The following expressions provide 
definitions for Shannon mutual information, information transfer and instantaneous 
mutual information: 

SIp{X;Y) = ^P(x,2/)log- 



P{x)P{y) 



P{x\y T) 

SITp{X]-Y]) = VP(x,y)log 



SITp{X^Y) = ^P{x,y)log p^^^ 



E , , P(x,y) 
P[x, y) log t^p^ I t^ ■ 

P{x\y ])P{y\x t) 
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Remark that 

SIp{x; y) = SITp{x ^ y) + SITp{y ^ x) + SITp{x T; y T). 

A general procedure of deriving influence for the procedures in [Ml I2H1 131] is 
given by fitting some models P{Xt\Xt-k...t-i), P{Xt\Xt-k...t-i,Yt-k...t-i), to the 
corresponding data segments and similar for Yj, and finally computing the statistic 
SITp{X ^ Y) — SITp{X ^ Y). A confidence for the sign of the statistic can be 
obtained by running the procedure on some randomized permutation of the sequences 
X and y. 

The continuous entropy of a Normal distribution is given by \j2Txea |29j . This 
implies that when the error of the observed data relative to some model is assumed to 
be Normal distributed, the Shannon entropy is estimated by the root mean squared 
error, in correspondence with common definitions of Granger causality. 

When it is assumed that P is a good model for the data, in a frequentistic interpre- 
tation, this means that for repetitive observation of the data, the data is distributed 
according to P, then ideal on-line data compression is with overwhelming probability 
performed by Shannon- Fano code [20]. The expected difference of the code-length of 
the on-line Shannon-Fano code and the unconditioned code is given by the Shannon 
information transfer. The expected code- length for optimal on-line encoding is given 
by enumerable information transfer within small terms 0. Therefore, mean Shannon 
information transfer and mean enumerable information transfer are equal within some 
constant. A formal version of this statement is given by Proposition [4.1 [ 

Proposition 4.1. 

Y.P{x,y)ElT(x ^ y) Y.n^.V)AlT{x ^ y) =+ SIp{X ^ Y) ± K{P) 

Proof. The right equality has the same structure as the proof of ^ISj Lemma II. 4]. 
The left equality is shown in [5]. □ 

4.2 Minimal sufficient statistics and ideal Shannon informa- 
tion transfer 

Algorithms for extracting P from a;, y as in the previous subsection, are often designed 
to let P model as much as possible properties that appear frequently within the signal, 
while at the same time keeping the descriptional complexity of P low. 

To idealize this procedure, it has been conjectured [18. that in the case of mul- 
tivariate models a, the constructed P should be chosen as a probabilistic minimal 
sufficient statistic of the data x,y [TJJ. Two ways of assigning causal relationships 
from such a P exists, either by computing SITp or by extracting a graphical schema 
from P. In [18 it is argued informally that for the multivariate minimal 
Bayesian network is a minimal sufficient statistic. At [18, Lemma 4], it is claimed 
that if a two-part code satisfying some syntactical form results in an incompressible 
string, the first part is the probabilistic minimal sufficient statistic. However, it is 
argued here that in many cases, any plausible graphical causal representation cannot 
be contained in a probabilistic minimal sufficient statistic of the data. If the plain 
Kolmogorov complexity of the weak minimal sufficient statistic 3 is computable from 
n, for example, it is relatively small, and the causal structure of the observables is 
not equivalent with some initial segment of the Halting sequence, than the causal 
information can not be in the minimal sufficient statistic. To show this, assume that 
the plain complexity of the weak minimal sufficient statistic is computable from n, it 
follows from C(P) =+ K{P\C{P)) K{P) [20, Lemma 3.1] that the weak minimal 



24 



sufRcient statistic is also a minimal sufficient statistic. From ^ Proposition 6.5 and 
Proposition 7.6] it follows that every weak minimal sufficient statistic is equivalent 
with some initial segment of the Halting sequence. Therefore, the minimal sufficient 
statistic can not include the causal information. 

In the proof of Proposition |4.2| a more formal argument will be given for the 
bivariate case, by providing a more explicit construction of two pairs of strings where 
a very different plausible causal relationship is present but also have the same minimal 
sufficient statistic. For these pairs, the enumerable information transfers do represent 
the plausible causal relationships. 

Proposition 4.2. There are strings x, y and x' , y' for which the same P is a minimal 
length conditional sufficient statistic such that 

EIT{x^y) ==+ EIT{x'^y') n-logn 

Eirly^x) =+ n-logn EITiy' ^ x') 0. 

Proof. Let k ~ log n and let 

tk ~ min{t : m(e, e) — mt{e, e) ^ 2^*^}. 

Remark that tk can be computed from n and the first k bits of m(e, e), thus K{tk) 
k. Let a S 2" be the lexicographic string with Kt^{a) ^ n. Let x — x' such that 
K{x\a*) ^+ n and let 

y = XOR(a,a;2...„0) 
y' = XOR(a,0a;"-i). 

Than it follows that Kt^^^j^^^{x,y) ^+ |n. Moreover, by [4, Lemma 7.6] 

^+ 2n 

Since K{x,y) |rt, this shows that the m-depth is k. By [SJ Proposition ?.?] it 
follows that Pxy as constructed there is a minimal sufficient statistic of x, y. The same 
reasoning shows that P^y is also a minimal sufficient statistic of x' ,y' . 

ft remains to show now the four inequalities of the Proposition. Remark that 
K{x) n, and also 

Kiy) =+ K{x,y)~K{x\y*) 
^+ K{x,a)~-K{a) 
=+ K{x\a*) n, 

since also K{y) ^+ l{y) = n. Remark that 

K{y\x T) K{a) = k 
K{x\y^) ^+l{x)^n, 

and since 

n + fc K{x\y t) + K{y\x t) ^+ K{x, y) n + fc, 

this shows that 

K{y\x]) =+ k 
K{x\y^) n. 

A analogue argument shows the inequalities for x', y' . □ 
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The argument can be extended to show the analogue of Proposition |4 . 2| for the multi- 
variate case with a complex incompressible causal structure, that contains no Halting 
information. 

5 Conclusion 

Ratio tests of universal semimeasures are defined and interpreted as an ideal solution 
to scientific composite hypothesis testing if an unlimited amount of computation can 
be assumed. Such universal elements exists if the hypotheses corresponds to a convex 
set or a product of convex sets of semimeasures, as for example in the hypothesis of 
independent sources for pairs of observations. 

Using generalized structural equations, different hypotheses of influence and causal- 
ity can be defined. These hypotheses define sets of enumerable semimeasures that have 
a universal element, and therefore they define different statistical tests. The detailed 
derivation of the statistical tests shows that there can be substantial differences in 
the corresponding confidences depending on the presumed directions of instantaneous 
information fiows. 

Associated causal semimeasures define a larger set of causal semimeasures who are 
not enumerable nor have a universal element. However, for the set of semimeasures 
associated with universal semimeasures, it is not clear whether a universal element 
exists, and consequently it is not clear whether they define some natural independence 
tests. However, these these tests can define ideal influence tests without assumptions 
on instantaneous information transfer. Different relations are summarized in figure |3] 

Finally the ideal methods of information transfer are contrasted with practical 
methods from literature. Also the method is contrasted with the use of minimal 
sufficient statistics and it is shown that enumerable information transfer can describe 
plausible causal relations where minimal sufficient statistics can not do. 

An upcoming paper provides some relatively tight coding theorem for these causal 
semimeasures which is exact within the logarithm of some notion of computational 
depth. 
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Prefix-free Kolmogorov complexity 

For excellent introductions to Kolmogorov complexity we refer to [T2j |20] . The defi- 
nitions differ here, with respect that Kolmogorov complexities are conditioned on the 
parameter n, in most cases representing the length of the first argument. 
An interpreter $ is a partial computable function: 

$ : X 2<'^ X uj<'^ uo<'^ : t,p,x^ '^t{p\x). 

and ^{p\x) = limt^oo ^t{p\x). The use of ui^'^ in this definition is to allow $ to have 
multiple inputs and outputs in ui associated with 2^". An interpreter is prefix- free 
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if for any x, the set of all p where <i>(p|a;) is defined, is prefix-free. Among the 
prefix-free Turing machines, there are machines that can simulate any computation 
on any other prefix-free machine by prefixing p with a finite binary string. These 
machines are called optimal universal Turing machines. Let <i> be some fixed optimal 
universal prefix- free interpreter. 

For some n € uj, and x,y £ lo^^ , the Kolmogorov complexity K{x\y), is defined 

as: 

Kt{x\y) = min{Z(p) : [= x] 

Kt{x) = Kt{x\e). 

K{x\y) and K{x) are obtained by taking the limit in t. 

Some properties of length conditional prefix-free Kolmogorov complexity for x G 

2": 

K{x) n 
K{x) K{K{x),x) 

Prefix- free Kolmogorov complexity is additive: 

K{x,y)^+ K{x) + K{y\x*), (21) 

where x* is a program of length K{x) that outputs x. 
For x,y G w^'^, n £ u>, 

X — >y 

means that there is a program p^ with l{px) ^ 0(1), such that ^(px\y,n) I— x. 
Remark that <i> is also conditioned to n. Also remark that if x — > y, than K{x) ^+ 
K{y). 

Lemma 5.1. For any w,p £ uj with $(p) i= w and l{p) ^+ K{w) we have 

w* < — > p< — >w, K{w). (22) 
Proof. See [T, Lemma 4.2]. □ 
Let TO be a universal semimeasure The coding theorem states: 

-\ogm{x\y) K{x\y). 
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